This project investigates how county-level conditions shape upward economic mobility for children from low-income families in the United States. Using data from the 1990 birth cohort, we analyze whether living in a high-poverty county inevitably leads to lower adult income ranks, or if certain local characteristics can mitigate this disadvantage. The dataset combines the Opportunity Atlas mobility outcome (kfr_pooled_pooled_p25) with a range of county variables including poverty rate, employment, college attainment, income inequality, and family structure. Multiple regression models are used to estimate the effects of these factors, with model selection based on AIC and BIC criteria. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects. These findings highlight the importance of local economic and educational resources in promoting intergenerational mobility, even in disadvantaged areas.
The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I’m using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.
Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.
The variables used in this analysis include the following:
emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.
hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.
poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).
frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor’s or higher), pooled across races/genders, 1990.
singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.
share_black1990: Fraction of population identified as Black in the 1990 Census.
foreign_share1990: Fraction of residents who are foreign-born in 1990.
gini1990: Measures income inequality for the county in 1990.
pop_pooled1990: Total county population in 1990.
There was a relatively small amount of missing data in the sample. The response variable (kfr_pooled_pooled_p25) had 61 missing values and one of the predictors (share_black1990) had 104 missing values. I chose to remove these rows with missing values due to the large size of the sample. This way, the analysis is solely based on real, observed data.
| Variable | Mean | SD | Min | Max | |
|---|---|---|---|---|---|
| kfr_pooled_pooled_p25 | 4.594614e-01 | 5.811120e-02 | 2.026000e-01 | 9.169000e-01 | |
| emp_pooled1990 | emp_pooled1990 | 6.799673e-01 | 7.741390e-02 | 3.070022e-01 | 8.629962e-01 |
| hhinc_median_pooled1990 | hhinc_median_pooled1990 | 5.900564e+04 | 1.627552e+04 | 2.112592e+04 | 1.457160e+05 |
| poor_share_pooled1990 | poor_share_pooled1990 | 1.665286e-01 | 7.901360e-02 | 2.180170e-02 | 5.997913e-01 |
| frac_coll_pooled1990 | frac_coll_pooled1990 | 1.352562e-01 | 6.592910e-02 | 3.689340e-02 | 5.341625e-01 |
| singlepar_pooled1990 | singlepar_pooled1990 | 2.033841e-01 | 6.656320e-02 | 4.802260e-02 | 6.015037e-01 |
| share_black1990 | share_black1990 | 8.940260e-02 | 1.452094e-01 | 7.910000e-05 | 8.623599e-01 |
| foreign_share1990 | foreign_share1990 | 7.179417e-01 | 1.520259e-01 | 1.346519e-01 | 9.723646e-01 |
| gini1990 | gini1990 | 4.240839e-01 | 3.797070e-02 | 2.712100e-01 | 5.924208e-01 |
| pop_pooled1990 | pop_pooled1990 | 7.989196e+04 | 2.648273e+05 | 6.750000e+02 | 8.863164e+06 |
The first regression model in my analysis was used to estimate a child’s mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income.
Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.
The model is given by:
kfr_pooled_pooled_p25 = 0.5079 + 0.1875 * emp_pooled1990 - 1.044e-06 * hhinc_median_pooled1990 + 0.03723 * poor_share_pooled1990 + 0.1123 * frac_coll_pooled1990 - 0.4581 * singlepar_pooled1990 - 0.05009 * share_black1990 + 0.003764 * foreign_share1990 - 0.09823 * gini1990 + 7.532e-09 * pop_pooled1990
According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.508 | 0.020 | 25.242 | 0.000 |
| emp_pooled1990 | 0.187 | 0.015 | 12.329 | 0.000 |
| hhinc_median_pooled1990 | 0.000 | 0.000 | -11.390 | 0.000 |
| poor_share_pooled1990 | 0.037 | 0.022 | 1.688 | 0.091 |
| frac_coll_pooled1990 | 0.112 | 0.017 | 6.704 | 0.000 |
| singlepar_pooled1990 | -0.458 | 0.018 | -25.735 | 0.000 |
| share_black1990 | -0.050 | 0.008 | -6.605 | 0.000 |
| foreign_share1990 | 0.004 | 0.006 | 0.679 | 0.497 |
| gini1990 | -0.098 | 0.032 | -3.047 | 0.002 |
| pop_pooled1990 | 0.000 | 0.000 | 2.588 | 0.010 |
To answer this question, I estimated regression models both with and without college-educated share and employment rate. I used AIC to compare model fit. The AIC spiked when these variables were removed from the model. This result shows that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty.
| Model | Description | AIC |
|---|---|---|
| Model 1 | All predictors | -10901.95 |
| Model 2 | No college, no employment | -10658.01 |
As shown in the table, the AIC is over 90 units lower in the full model, meaning adding education and employment greatly improves explanatory power. According to the bar cahrt below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success.
In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with a wide range of job opportunities and higher levels of education. However, several limitations should be noted. The model relies on observational, cross-sectional data, limiting out ability to make strong claims about causality. There are also unmeasured factors, in this dataset, such as school quality or neighborhood effects. Additionally, even the small amount of missing data could potentially influence the accuracy of estimates. Overall, these findings provide valuable insight into what kinds of actions policymakers and communities can take to promote upward mobility to the areas in America that need it the most.
Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf
Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf
---
title: "Drivers of Economic Mobility"
output:
flexdashboard::flex_dashboard:
theme: simplex
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(broom)
library(knitr)
library(corrplot)
library(tidyverse)
library(dplyr)
library(MASS)
library(ggplot2)
library(maps)
library(gridExtra)
df1 <- read_csv("~/Downloads/county_by_cohort_estimates.csv")
df2 <- read_csv("~/Downloads/Table_8_county_covariates.csv")
outcomes_small <- df1 %>%
filter(cohort == 1990) %>%
dplyr::select(state, county, state_name, county_name,
kfr_pooled_pooled_p25)
covars_small <- df2 %>%
dplyr::select(
state, county,
emp_pooled1990,
hhinc_median_pooled1990,
poor_share_pooled1990,
frac_coll_pooled1990,
singlepar_pooled1990,
share_black1990,
foreign_share1990,
gini1990,
pop_pooled1990
)
df <- outcomes_small %>%
inner_join(covars_small, by = c("state", "county"))
colSums(is.na(df))
df_complete <- df[complete.cases(df), ]
colSums(is.na(df_complete))
model1 <- lm(kfr_pooled_pooled_p25 ~ emp_pooled1990 + hhinc_median_pooled1990
+ poor_share_pooled1990+frac_coll_pooled1990+singlepar_pooled1990
+share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete)
model2 <- lm(kfr_pooled_pooled_p25 ~ hhinc_median_pooled1990
+ poor_share_pooled1990+singlepar_pooled1990
+share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete)
quant_vars <- c("emp_pooled1990", "hhinc_median_pooled1990", "poor_share_pooled1990",
"frac_coll_pooled1990", "singlepar_pooled1990", "share_black1990",
"foreign_share1990", "gini1990", "pop_pooled1990")
stepwise_aic <- stepAIC(model1, direction = "both", trace = TRUE)
stepwise_aic2 <- stepAIC(model2, direction = "both", trace = TRUE)
summary_table <- df %>%
dplyr::select(all_of(c("kfr_pooled_pooled_p25", quant_vars))) %>%
summarise(across(everything(),
list(Mean = ~mean(., na.rm=TRUE),
SD = ~sd(., na.rm=TRUE),
Min = ~min(., na.rm=TRUE),
Max = ~max(., na.rm=TRUE)),
.names = "{.col}_{.fn}"))
summary_long <- data.frame(
Variable = c("kfr_pooled_pooled_p25", quant_vars),
Mean = c(mean(df$kfr_pooled_pooled_p25, na.rm=T),
sapply(df[quant_vars], function(x) mean(x, na.rm=T))),
SD = c(sd(df$kfr_pooled_pooled_p25, na.rm=T),
sapply(df[quant_vars], function(x) sd(x, na.rm=T))),
Min = c(min(df$kfr_pooled_pooled_p25, na.rm=T),
sapply(df[quant_vars], function(x) min(x, na.rm=T))),
Max = c(max(df$kfr_pooled_pooled_p25, na.rm=T),
sapply(df[quant_vars], function(x) max(x, na.rm=T)))
)
cor_matrix <- cor(df[, c("kfr_pooled_pooled_p25", quant_vars)], use = "complete.obs")
county_map <- map_data("county")
county_map <- county_map %>%
mutate(
state = tolower(region),
county = tolower(subregion)
)
df_map <- df %>%
mutate(
state = tolower(state_name),
county = tolower(county_name)
)
plot_data <- inner_join(county_map, df_map, by = c("state", "county"))
coef_table <- tidy(model1)
aic1 <- AIC(model1)
aic2 <- AIC(model2)
aic_table <- data.frame(
Model = c("Model 1", "Model 2"),
Description = c("All predictors", "No college, no employment"),
AIC = c(aic1, aic2)
)
main_effects <- data.frame(
Variable = c("Poverty Rate", "College-Educated Share (age 25+)", "Emp. Rate at age 27"),
Coefficient = c(
coef(model1)[["poor_share_pooled1990"]],
coef(model1)[["frac_coll_pooled1990"]],
coef(model1)[["emp_pooled1990"]]
)
)
```
Introduction
===
Column {data-width=1300}
---
### Abstract
This project investigates how county-level conditions shape upward economic mobility for children from low-income families in the United States. Using data from the 1990 birth cohort, we analyze whether living in a high-poverty county inevitably leads to lower adult income ranks, or if certain local characteristics can mitigate this disadvantage. The dataset combines the Opportunity Atlas mobility outcome (kfr_pooled_pooled_p25) with a range of county variables including poverty rate, employment, college attainment, income inequality, and family structure. Multiple regression models are used to estimate the effects of these factors, with model selection based on AIC and BIC criteria. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects. These findings highlight the importance of local economic and educational resources in promoting intergenerational mobility, even in disadvantaged areas.
Column {data-width=1000}
-----------------------------------------------------------------------
### Research Questions
* Which county-level characteristics are associated with higher income mobility for children from low-income families?
* Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?
### Source
The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I'm using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.
Column {data-width=1100}
---
### Background/Significance
Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.
```{r}
```
Column {.tabset data-width=550}
-----------------------------------------------------------------------
Data Description and EDA
===
Column{.tabset}
---
### Variables Used
The variables used in this analysis include the following:
#### Response variable:
- kfr_pooled_pooled_p25: Mean percentile rank, relative to other children born the same year, in the national distribution of household income at age 27, for children whose parents are at the 25th percentile of national income, pooled across races and genders.
#### Explanatory variables:
- emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.
- hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.
- poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).
- frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor's or higher), pooled across races/genders, 1990.
- singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.
- share_black1990: Fraction of population identified as Black in the 1990 Census.
- foreign_share1990: Fraction of residents who are foreign-born in 1990.
- gini1990: Measures income inequality for the county in 1990.
- pop_pooled1990: Total county population in 1990.
#### Missing Data
There was a relatively small amount of missing data in the sample. The response variable (kfr_pooled_pooled_p25) had 61 missing values and one of the predictors (share_black1990)
had 104 missing values. I chose to remove these rows with missing values due to the large size of the sample. This way, the analysis is solely based on real, observed data.
### Summaries
#### Summary Statistic Table
```{r}
kable(summary_long, format = "simple")
```
#### Heatmap of all U.S. counties
```{r}
ggplot(plot_data, aes(long, lat, group = group, fill = kfr_pooled_pooled_p25)) +
geom_polygon(color = "white", size = 0.1) +
coord_fixed(1.3) +
scale_fill_viridis_c(option = "plasma") +
theme_void() +
labs(title = "Income Mobility by County")
```
Methods used
===
### Regression Model
The first regression model in my analysis was used to estimate a child's mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income.
Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.
The model is given by:
kfr_pooled_pooled_p25 = 0.5079 +
0.1875 * emp_pooled1990 -
1.044e-06 * hhinc_median_pooled1990 +
0.03723 * poor_share_pooled1990 +
0.1123 * frac_coll_pooled1990 -
0.4581 * singlepar_pooled1990 -
0.05009 * share_black1990 +
0.003764 * foreign_share1990 -
0.09823 * gini1990 +
7.532e-09 * pop_pooled1990
### Diagnostic plots
```{r, fig.width=20, fig.height=5}
par(mfrow = c(1, 4))
par(mar = c(4, 4, 2, 1))
plot(model1)
par(mfrow = c(1, 1))
```
Research question 1
===
#### Which county-level characteristics are associated with higher income mobility for children from low-income families?
According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.
#### Correlation Heatmap
```{r}
corrplot(cor_matrix, method = "color")
```
```{r}
kable(coef_table, digits = 3, caption = "Regression Coefficient Table")
```
Research question 2
===
#### Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?
To answer this question, I estimated regression models both with and without college-educated share and employment rate. I used AIC to compare model fit. The AIC spiked when these variables were removed from the model. This result shows that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty.
```{r}
kable(aic_table, caption = "AIC Comparison of Regression Models")
```
As shown in the table, the AIC is over 90 units lower in the full model, meaning adding education and employment greatly improves explanatory power. According to the bar cahrt below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success.
```{r, fig.width= 10, fig.height=5}
ggplot(main_effects, aes(x = Variable, y = Coefficient, fill = Variable)) +
geom_col(width = 0.7) +
labs(title = "Main County-Level Effects on Upward Mobility",
y = "Estimated Coefficient",
x = "") +
theme_minimal() +
scale_fill_brewer(palette = "Set2") +
geom_text(aes(label = round(Coefficient, 3)), vjust = -0.5)
```
Conclusion
===
### Discussion
In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with a wide range of job opportunities and higher levels of education. However, several limitations should be noted. The model relies on observational, cross-sectional data, limiting out ability to make strong claims about causality. There are also unmeasured factors, in this dataset, such as school quality or neighborhood effects. Additionally, even the small amount of missing data could potentially influence the accuracy of estimates. Overall, these findings provide valuable insight into what kinds of actions policymakers and communities can take to promote upward mobility to the areas in America that need it the most.
### About the Author
My name is Scott Robbins, I am currently pursuing a Bachelor of Arts in Economics with a minor in data analytics.
### References
Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf
Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf